Value Function Approximation and Policy Performance
ثبت نشده
چکیده
Fig. 1 gives a geometric interpretation of value function approximation. We may think of J � as a vector in ∗; by considering approximations of the form J̃ = �r, we restrict attention to the hyperplane J = �r in the same space. Given a norm ≤ · ≤ (e.g., the Euclidean norm), an ideal value function approximation algorithm would choose r minimizing ≤J −�r≤; in other words, it would find the projection �r of J onto the hyperplane. Note that ≤J −�r≤ is a natural measure for the quality of the approximation architecture, since it is the best approximation error that can be attained by any algorithm given the choice of �. Algorithms for value function approximation found in the literature do not compute the projection �r , since this is an intractable problem. Building on the knowledge that J � satisfies Bellman’s equation, value function approximation typically involves adaptation of exact dynamic programming algorithms. For in stance, drawing inspiration from value iteration, one might consider the following approximate value iteration algorithm: �rk+1 = �T�rk, where � is a projection operator which maps T�rk back onto the hyperplane �r. Faced with the impossibility of computing the “best approximation” �r , a relevant question for any value function approximation algorithm A generating an approximation �rA is how large ≤J − �rA≤ is in comparison with ≤J − �r ≤. In particular, it would be desirable that, if the approximation architecture
منابع مشابه
Debt Collection Industry: Machine Learning Approach
Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...
متن کاملRates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function, Q. Theory is presented showing that linear function approximation representations of Q can degrade the rate of convergence of performance gradient estimates by a factor of O(ML) relative to when no func...
متن کاملLocalizing Policy Gradient Estimates to Action Transitions
Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...
متن کاملLocalizing Policy Gradient Estimates to Action
Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges signiicantly faster t...
متن کاملVerification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation
Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...
متن کاملMinimizing a General Penalty Function on a Single Machine via Developing Approximation Algorithms and FPTASs
This paper addresses the Tardy/Lost penalty minimization on a single machine. According to this penalty criterion, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Besides its application in real world problems, Tardy/Lost measure is a general form for popular objective functions like weighted tardiness, late work and tardiness with reje...
متن کامل